-
-
Notifications
You must be signed in to change notification settings - Fork 146
Improve optimizer for polymorphism and connections: post-fetch prefetching, InheritanceManager support, and fewer N+1s #808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… prefetch path rewriting
…subtype-specific fields to validate query structure
…els and schema for ProjectNote and ArtProjectNote (contain missing optimisations)
…d ensure valid query paths
…ions and subtype-specific prefetching
…s, including edge cases for unknown relations and skipped hints
…orphic relations with postfetch handling
…eliable postfetch batching with polymorphic relations
…ance manager relay
…d connections and managers
…ry optimization for nested polymorphic relations
…nhance postfetch optimization for polymorphic relations
…r polymorphic relations with inline fragments and FK chaining
…ss nested relations and improve polymorphic relation handling
…g with new polymorphic relations
… subclasses with selected fields/prefetches are included, and enhance nested prefetch path resolution
…polymorphic relations with inline fragments and FK chains
…ith English equivalents for consistency and clarity in polymorphic relation tests
…fy `default_qs_hook` logic by delegating postfetch optimizations
…tests to validate optimized query handling and N+1 prevention across various polymorphic relations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @stygmate, your pull request is larger than the review limit of 150000 diff characters
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #808 +/- ##
==========================================
- Coverage 89.71% 87.60% -2.11%
==========================================
Files 45 46 +1
Lines 4298 4882 +584
==========================================
+ Hits 3856 4277 +421
- Misses 442 605 +163 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I realized that under certain conditions, evaluating a queryset was loading all the records from the tables. I’ll look for fixes. |
…full, first, last) to validate query optimization and ensure stable query counts
…al connection pages and remove redundant queryset evaluation
for more information, see https://pre-commit.ci
…d postfetch logic Streamline error handling by replacing generic exceptions with specific ones where applicable, utilize `contextlib.suppress` for cleaner code, and improve import clarity for postfetch utilities.
…c and reduced code duplication Simplify optimizer by moving nested logic into helper functions and reusing shared utilities for prefetch path extraction, rewriting, and inheritance handling. Streamline postfetch logic with a dedicated `__prefetch_child_root` function.
…tests and improve docstring clarity
…d upgrade Relay nodes for consistency
for more information, see https://pre-commit.ci
|
I fixed the bug I detected earlier today. I also did quite a bit of refactoring to meet Pyright and Ruff requirements and to make my changes clearer. |
…orphic relation tests to improve clarity and simplify code
…using backslash for multi-line statement
for more information, see https://pre-commit.ci
…tion tests for clarity and consistency
|
I have code branches that aren’t covered by tests. Given the complexity of handling optimizations related to polymorphism, I worked iteratively (hence the large number of commits), and some parts of the code may have become unnecessary. That said, I’m seeing very promising results. |
… implement tests for excessive materialization and postfetch optimization in both Relay and standard polymorphism scenarios.
… resolution strategies
…er usage Simplify error handling with `contextlib.suppress`, replace private `_db` attribute with public `db` property, and enhance code clarity in pagination and total count resolution logic.
…up docstrings in polymorphism tests Normalize variable naming (`N` to `n`), enhance assertions with additional checks for data presence, and refine docstring formatting for clarity and consistency across test modules.
…-database setups Improve handling of database aliases (`db_alias`) in postfetch logic to ensure correct querying across multiple databases. Adjust related instance grouping, manager usage, and `_prefetched_objects_cache` injection logic for robustness.
…tfetch logic Replace exception handling with conditional checks for `using` method and streamline alias assignment logic by removing redundant `contextlib.suppress` blocks in database alias determination. Improve code clarity and maintainability.
for more information, see https://pre-commit.ci
bellini666
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial comments, but still need to take another look after.
Also, I hate doing this, but would it be possible to split this into multiple PRs? We have so much going on here that it is really hard to understand what is going on in all the changes
| # If a projects query exists, ensure it does NOT batch across multiple company ids. | ||
| # It's acceptable that no projects query is executed if data was served from cache | ||
| # after page-level postfetch populated it. | ||
| if projects_sql: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: does this if makes sense in a test context? I mean, this should be deterministic, unless this was a pytest parametrized value
| # Collect all details.text under ArtProjectType nodes | ||
| details_texts = set() | ||
| for c_edge in companies: | ||
| company_node = c_edge.get("node") or {} | ||
| projects_conn = company_node.get("projects") or {} | ||
| for p_edge in projects_conn.get("edges", []): | ||
| node = (p_edge or {}).get("node") or {} | ||
| if node.get("__typename") != "ArtProjectType": | ||
| continue | ||
| art_notes_conn = node.get("artNotes") or {} | ||
| for n_edge in art_notes_conn.get("edges", []): | ||
| note_node = (n_edge or {}).get("node") or {} | ||
| details_conn = note_node.get("details") or {} | ||
| for d_edge in details_conn.get("edges", []): | ||
| d_node = (d_edge or {}).get("node") or {} | ||
| text = d_node.get("text") | ||
| if text: | ||
| details_texts.add(text) | ||
|
|
||
| assert {"d11", "d12", "d21"}.issubset(details_texts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: maybe we should just do a simpler assert result.data == {...}? Seems easier to visualize the output, and the code is less complex
| # If a projects query exists, ensure it does NOT batch across multiple company ids. | ||
| # It's acceptable that no projects query is executed if data was served from cache | ||
| # after page-level postfetch populated it. | ||
| if projects_sql: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I started reviews from bottom to top. Ditto from same suggestion in the other similar test
| assert isinstance(companies, list) | ||
| assert companies | ||
| art_projects = [ | ||
| p for p in companies[0]["projects"] if p["__typename"] == "ArtProjectType" | ||
| ] | ||
| details_texts = { | ||
| d["text"] | ||
| for p in art_projects | ||
| for n in p.get("artNotes", []) | ||
| for d in n.get("details", []) | ||
| } | ||
| assert {"d11", "d12", "d21"}.issubset(details_texts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I commented about this below, I think just doing result.data == {...} is better
| # If a projects query exists, ensure it does NOT batch across multiple company ids. | ||
| # It's acceptable that no projects query is executed if data was served from cache | ||
| # after page-level postfetch populated it. | ||
| if projects_sql: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: ditto below
| from strawberry_django.queryset import ( | ||
| get_queryset_config as _get_qs_cfg, | ||
| ) | ||
|
|
||
| cfg = _get_qs_cfg(qs) | ||
| if not getattr(cfg, "postfetch_prefetch", None): | ||
| return cache[use_cache_key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Ditto
Also, can't we just import those at module-level? (don't remember if they would cause import loops)
|
|
||
| # Helper to apply early SQL pagination for simple forward root connections | ||
| def _apply_early_pagination(qs: models.QuerySet): | ||
| import contextlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: this should be imported at module level
| # Build ORDER BY expressions as Django does for window pagination | ||
| from django.db import DEFAULT_DB_ALIAS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: ditto
| # Annotate row number and total count (global partition) | ||
| from django.db.models import Count, Window | ||
| from django.db.models.functions import RowNumber |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: ditto
| qs = qs.annotate( | ||
| _strawberry_row_number=Window( | ||
| RowNumber(), | ||
| partition_by=None, | ||
| order_by=order_by_exprs, | ||
| ), | ||
| _strawberry_total_count=Window( | ||
| Count(1), | ||
| partition_by=None, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: this is duplicating some code that we have on pagination module. Also, maybe this whole function should be there, as it isnot related to the base field itself
Summary
This PR substantially improves the Django optimizer to better support polymorphic models, nested reverse relations, and Relay connections. It introduces a post-fetch prefetching pass that batches reverse relations after queryset evaluation and injects them into Django’s
_prefetched_objects_cache. It also adds dedicated handling forInheritanceManager(django-polymorphic) and enhances Relay connection resolution to avoid extra queries (includingtotalCount).Key changes
strawberry_django/postfetch.pyand integration intodefault_qs_hook.select_subclassesso subclass instances get correct lookups and avoid redundant queries.totalCountvia a windowCount(1)in the main query when requested, avoiding a separatecount()query._prefetched_objects_cacheon the source instance..get()when a prefetched result cache is present to prevent LIMIT queries discarding prefetched data.docs/guide/optimizer.md.Configuration
OptimizerConfigretains existing flags (enable_only,enable_select_related,enable_prefetch_related,enable_annotate,enable_nested_relations_prefetch) and adds support pathways forpostfetch_prefetchand parent-branch hints carried viaOptimizerStore.Performance and behavior
InheritanceManager.select_subclassesby rewriting prefetch paths relative to the subclass.count()queries fortotalCountby using SQL window functions when the field is selected.Related issues
Notes
result_cache).tests/polymorphism*,tests/polymorphism_relay*, andtests/polymorphism_inheritancemanager*.Review note
This one’s a heavyweight and touches the optimizer’s core. It needs a reviewer who knows the library well and can take the time for a slow, careful pass @bellini666 ☕🙂. Also, if anyone wants to experiment with this version in their projects, please do—real-world mileage reports are pure gold.